Designing and Evaluating a Russian Tagset

نویسندگان

Serge Sharoff

Mikhail Kopotev

Tomaz Erjavec

Anna Feldman

Dagmar Divjak

چکیده

This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 600 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set of tagging models and corpora that can be shared with other

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Positional Tagset for Russian

Fusional languages have rich inflection. As a consequence, tagsets capturing their morphological features are necessarily large. A natural way to make a tagset manageable is to use a structured system. In this paper, we present a positional tagset for describing morphological properties of Russian. The tagset was inspired by the Czech positional system (Hajič, 2004). We have used preliminary ve...

متن کامل

Towards a reference tagset for Japanese

This is a progress report on ongoing research aimed at proposing a ‘reference’ morphosyntactic part-of-speech tagset for the Japanese language. Such a tagset should be linguistically motivated, explicit, broadly applicable, and computationally tractable. Being well defined, such a tagset should be easily adapted in specific ways (e.g. limited, extended or modified). The author is currently atte...

متن کامل

Designing a Common POS-Tagset Framework for Indian Languages

Research in Parts-of-Speech (POS) tagset design for European and East Asian languages started with a mere listing of important morphosyntactic features in one language and has matured in later years towards hierarchical tagsets, decomposable tags, common framework for multiple languages (EAGLES) etc. Several tagsets have been developed in these languages along with large amount of annotated dat...

متن کامل

Building a Dependency Parsing Model for Russian with MaltParser and MyStem Tagset

The paper describes a series of experiments on building a dependency parsing model using MaltParser, the SynTagRus treebank of Russian, and the morphological tagger Mystem. The experiments have two purposes. The first one is to train a model with a reasonable balance of quality and parsing time. The second one is to produce user-friendly software which would be practical for obtaining quick res...

متن کامل

Evaluating Distributional Properties of Tagsets

We investigate which distributional properties should be present in a tagset by examining different mappings of various current part-ofspeech tagsets, looking at English, German, and Italian corpora. Given the importance of distributional information, we present a simple model for evaluating how a tagset mapping captures distribution, specifically by utilizing a notion of frames to capture the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Designing and Evaluating a Russian Tagset

نویسندگان

چکیده

منابع مشابه

A Positional Tagset for Russian

Towards a reference tagset for Japanese

Designing a Common POS-Tagset Framework for Indian Languages

Building a Dependency Parsing Model for Russian with MaltParser and MyStem Tagset

Evaluating Distributional Properties of Tagsets

عنوان ژورنال:

اشتراک گذاری